Explicit Neural Word Representation

نویسندگان

  • Navid Rekabsaz
  • Mihai Lupu
  • Allan Hanbury
  • Bhaskar Mitra
چکیده

Recent advances in word embedding provide significant benefit to various information processing tasks. Yet these dense representations and their estimation of word-to-word relatedness remain difficult to interpret and hard to analyze. As an alternative, explicit word representations propose vectors whose dimensions are easily interpretable, and recent methods show competitive performance to the dense vectors. We introduce a neural-based explicit representation, rooted in the conceptual ideas of the word2vec Skip-Gram model. The method provides interpretable explicit vectors while keeping the effectiveness of the Skip-Gram model. The evaluation of various explicit representations on word association collections shows that the newly proposed method outperforms the state-of-the-art explicit representations when tasked with ranking highly similar terms. As a case study on the use of our explicit representation, we show the degree of the existence of gender bias in the English language (used in Wikipedia) in regards to various occupations. By measuring the bias towards explicit Female and Male factors, the study quantifies a general tendency of the majority of the occupations to male and a strong bias in a few specific occupations (e.g. nurse) to female. ACM Reference format: Navid Rekabsaz, Mihai Lupu, Allan Hanbury and Bhaskar Mitra. 2017. Explicit Neural Word Representation. In Proceedings of ACM International Conference on Information and Knowledge Management, Singapore, 2017

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective

Recently significant advances have been witnessed in the area of distributed word representations based on neural networks, which are also known as word embeddings. Among the new word embedding models, skip-gram negative sampling (SGNS) in the word2vec toolbox has attracted much attention due to its simplicity and effectiveness. However, the principles of SGNS remain not well understood, except...

متن کامل

Text Embedding with Advanced Recurrent Neural Model

Embedding method has become a popular way to handle unstructured data, such as word and text. Word embedding, providing computational-friendly representations for word similarity, is almost be one of the standard solutions for various text mining tasks. Lots of recent studies focusing on word embedding try to generate a more comprehensive representation for each word that incorporating task-spe...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Toward Incorporation of Relevant Documents in word2vec

Recent advances in neural word embedding provide significant benefit to various information retrieval tasks. However as shown by recent studies, adapting the embedding models for the needs of IR tasks can bring considerable further improvements. The embedding models in general define the term relatedness by exploiting the terms’ co-occurrences in short-window contexts. An alternative (and well-...

متن کامل

Large Vocabulary Recognition of On - LineHandwritten Cursive

| This paper presents a writer independent system for large vocabulary recognition of on-line handwritten cursive words. The system rst uses a ltering module, based on simple letter features, to quickly reduce a large reference dictionary (lexicon) to a more manageable size; the reduced lexicon is subsequently fed to a recognition module. The recognition module uses a temporal representation of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017